In [1]:
import warnings
warnings.filterwarnings("ignore")

10X Visium Spatial Transcriptomics: Preprocessing and Quality Control¶

This notebook performs initial loading and quality control (QC) for multiple 10X Visium spatial transcriptomics samples.

Workflow:

  1. Setup: Import libraries and configure paths and parameters.
  2. Loop through samples: a. Load sample data (filtered_feature_bc_matrix.h5 and spatial information) using scanpy.read_visium. b. Ensure gene names (var_names) are unique. c. Verify spatial information loaded correctly. d. Perform Quality Control: i. Calculate QC metrics (total counts, genes per spot, mitochondrial percentage). ii. Generate and save pre-filtering QC plots (violin, scatter). iii. Determine dynamic filtering thresholds using Median Absolute Deviation (MAD). iv. Filter low-quality spots based on thresholds. v. Generate and save post-filtering QC plots (violin, spatial map showing filtered spots). e. Save the processed AnnData object (.h5ad) for the sample.
  3. Conclusion: Summarize outputs.

1. Setup: Imports and Configuration¶

In [2]:
import scanpy as sc
import squidpy as sq # Often used alongside scanpy for spatial analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import copy
import traceback
import warnings
import os
from scipy import stats # For Median Absolute Deviation (MAD)

Configuration Parameters¶

Set the paths to your data, define sample IDs, and specify QC parameters.

In [3]:
data_dir = Path("./GSE283269_RAW/")
sample_ids = ["Autopsy1", "Autopsy2", "Autopsy3", "Explanted1", "Explanted2", "Explanted3"] # Your sample IDs
count_file_name = "filtered_feature_bc_matrix.h5"
spatial_folder_name = "spatial"
tissue_positions_file_name = "tissue_positions.csv"
scalefactors_file_name = "scalefactors_json.json"
hires_image_name = "tissue_hires_image.png"
lowres_image_name = "tissue_lowres_image.png"
output_adata_dir = Path("./101_Preprocess_processed_adata")
qc_figures_dir = Path("./101_Preprocess_qc_figures")
output_adata_dir.mkdir(parents=True, exist_ok=True)
qc_figures_dir.mkdir(parents=True, exist_ok=True)
mt_gene_prefix = 'MT-'
mad_threshold_n_genes = 3.0
mad_threshold_total_counts = 3.0
mad_threshold_mt_pct = 3.0
min_gene_threshold = 50
min_count_threshold = 100
max_mt_threshold = 25.0

2. Processing Loop: Load, QC, Save¶

Iterate through each sample_id defined in the configuration.

In [4]:
adatas_processed = {} # Store processed adatas if needed later
individual_uns_spatial = {} # Store the .uns['spatial'] separately (as before)

print("Loading individual samples, applying QC, and saving...")
for sample_id in sample_ids:
    print(f"--- Processing {sample_id} ---")
    sample_path = data_dir / sample_id
    # Using the specified file/folder names from config
    count_file_path = sample_path / count_file_name
    spatial_path = sample_path / spatial_folder_name
    tissue_positions_path = spatial_path / tissue_positions_file_name
    scalefactors_path = spatial_path / scalefactors_file_name

    # --- File Checks ---
    if not (sample_path.is_dir() and count_file_path.is_file() and spatial_path.is_dir() and
            tissue_positions_path.is_file() and scalefactors_path.is_file()):
         print(f"  ERROR: One or more essential files missing for {sample_id}. Check paths/names:")
         print(f"    - Sample Path: {sample_path} (Exists: {sample_path.is_dir()})")
         print(f"    - Count File: {count_file_path} (Exists: {count_file_path.is_file()})")
         print(f"    - Spatial Folder: {spatial_path} (Exists: {spatial_path.is_dir()})")
         print(f"    - Tissue Positions: {tissue_positions_path} (Exists: {tissue_positions_path.is_file()})")
         print(f"    - Scalefactors: {scalefactors_path} (Exists: {scalefactors_path.is_file()})")
         print("  Skipping.")
         continue

    try:
        print(f"  1. Loading {sample_id} using scanpy.read_visium...")
        # Suppress FutureWarning temporarily if desired
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", message=".*squidpy.*", category=FutureWarning)
            warnings.filterwarnings("ignore", message="Variable names are not unique.", category=UserWarning)
            adata_sample = sc.read_visium(
                path=sample_path,
                count_file=count_file_path.name, # Pass only filename relative to path
                source_image_path=spatial_path / "tissue_hires_image.png", # Assuming hires image name
                library_id=sample_id, # Important for uns structure
                load_images=True,
            )
        print(f"     Initial load: {adata_sample.n_obs} spots, {adata_sample.n_vars} genes")

        # --- Make var_names unique ---
        if not adata_sample.var_names.is_unique:
            print("  2. Making var_names unique...")
            adata_sample.var_names_make_unique()
        else:
             print("  2. Var_names are already unique.")

        # --- Verify loading & Store .uns['spatial'] ---
        # (Your original verification logic is good, adapted slightly)
        print(f"  3. Verifying spatial metadata for library_id '{sample_id}'...")
        obsm_ok = 'spatial' in adata_sample.obsm
        uns_spatial_content = None
        uns_ok = False

        if 'spatial' in adata_sample.uns and isinstance(adata_sample.uns['spatial'], dict):
             if sample_id in adata_sample.uns['spatial']:
                  uns_spatial_content = adata_sample.uns['spatial'][sample_id]
                  if isinstance(uns_spatial_content, dict) and \
                     'images' in uns_spatial_content and \
                     'scalefactors' in uns_spatial_content:
                       uns_ok = True
                       print(f"     Metadata loaded and verified in .uns['spatial']['{sample_id}']: True")
                       # Store a DEEP COPY for later manual merging if needed
                       individual_uns_spatial[sample_id] = copy.deepcopy(uns_spatial_content)
                  else:
                       print(f"     WARNING: Metadata structure incorrect within .uns['spatial']['{sample_id}']. Content: {uns_spatial_content}")
             else:
                  print(f"     ERROR: Library ID '{sample_id}' NOT FOUND as key within .uns['spatial']. Keys found: {list(adata_sample.uns['spatial'].keys())}")
        else:
             print(f"     ERROR: Metadata key 'spatial' NOT FOUND in .uns or is not a dictionary.")

        print(f"     Coordinates loaded (.obsm['spatial']): {obsm_ok}")

        if not (obsm_ok and uns_ok):
            print(f"  ERROR: Failed to load spatial coordinates or full/correct metadata for {sample_id}. Skipping QC and saving.")
            continue

        # --- <<< START QUALITY CONTROL >>> ---
        print(f"  4. Performing Quality Control for {sample_id}...")

        # 4a. Calculate Mitochondrial Gene Percentage
        print(f"     Calculating MT percentage using prefix '{mt_gene_prefix}'...")
        adata_sample.var['mt'] = adata_sample.var_names.str.startswith(mt_gene_prefix)
        sc.pp.calculate_qc_metrics(adata_sample, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
        n_mt_genes = adata_sample.var['mt'].sum()
        if n_mt_genes == 0:
             print(f"     WARNING: No mitochondrial genes found with prefix '{mt_gene_prefix}'. Check the prefix or gene names. MT% filtering will be skipped.")
             # Handle case where no MT genes are found - maybe skip MT filtering or use a dummy value
             adata_sample.obs['pct_counts_mt'] = 0.0 # Assign 0 if no MT genes found
        else:
             print(f"     Found {n_mt_genes} mitochondrial genes.")


        # 4b. Generate Pre-filtering QC Plots
        print("     Generating pre-filtering QC plots...")
        fig, axs = plt.subplots(1, 3, figsize=(15, 4))
        sc.pl.violin(adata_sample, ['n_genes_by_counts'], jitter=0.4, ax=axs[0], show=False)
        sc.pl.violin(adata_sample, ['total_counts'], jitter=0.4, ax=axs[1], show=False)
        sc.pl.violin(adata_sample, ['pct_counts_mt'], jitter=0.4, ax=axs[2], show=False)
        fig.suptitle(f'QC Metrics Before Filtering - {sample_id}')
        plt.tight_layout(rect=[0, 0, 1, 0.96]) # Adjust layout to prevent title overlap
        plot_path = qc_figures_dir / f"{sample_id}_qc_violin_before_filtering.png"
        plt.show()
        plt.savefig(plot_path)
        print(f"     Saved pre-filter violin plot: {plot_path}")
        plt.close(fig) # Close figure to free memory

        # Scatter plots for relationships
        fig, axs = plt.subplots(1, 2, figsize=(12, 5))
        sc.pl.scatter(adata_sample, x='total_counts', y='pct_counts_mt', ax=axs[0], show=False)
        sc.pl.scatter(adata_sample, x='total_counts', y='n_genes_by_counts', ax=axs[1], show=False)
        fig.suptitle(f'QC Scatter Plots Before Filtering - {sample_id}')
        plt.tight_layout(rect=[0, 0, 1, 0.96])
        plot_path = qc_figures_dir / f"{sample_id}_qc_scatter_before_filtering.png"
        plt.show()
        plt.savefig(plot_path)
        print(f"     Saved pre-filter scatter plot: {plot_path}")
        plt.close(fig)


        # 4c. Determine Dynamic Thresholds using MAD
        print("     Determining dynamic filtering thresholds...")

        # Function to calculate MAD-based threshold (lower bound)
        def get_lower_bound(metric_data, n_mad, fallback_min):
            median_val = np.nanmedian(metric_data)
            mad_val = stats.median_abs_deviation(metric_data, nan_policy='omit')
            threshold = median_val - n_mad * mad_val
            print(f"       - Median: {median_val:.2f}, MAD: {mad_val:.2f}, Raw Lower Threshold ({n_mad} MAD): {threshold:.2f}")
            # Ensure threshold is reasonable and above absolute minimum
            threshold = max(threshold, fallback_min)
            # Ensure threshold is not negative
            threshold = max(threshold, 0)
            print(f"       - Final Lower Threshold (>= {fallback_min}, >= 0): {threshold:.2f}")
            return threshold

        # Function to calculate MAD-based threshold (upper bound)
        def get_upper_bound(metric_data, n_mad, fallback_max):
            median_val = np.nanmedian(metric_data)
            mad_val = stats.median_abs_deviation(metric_data, nan_policy='omit')
            threshold = median_val + n_mad * mad_val
            print(f"       - Median: {median_val:.2f}, MAD: {mad_val:.2f}, Raw Upper Threshold ({n_mad} MAD): {threshold:.2f}")
            # Ensure threshold is reasonable and below absolute maximum
            threshold = min(threshold, fallback_max)
            # Ensure threshold is not > 100 for percentages
            if fallback_max <= 100:
                 threshold = min(threshold, 100.0)
            print(f"       - Final Upper Threshold (<= {fallback_max}, <= 100): {threshold:.2f}")

            # Handle case where MAD is 0 (all values are the same) - use fallback directly
            if mad_val == 0:
                print(f"       - MAD is 0, using fallback max: {fallback_max}")
                return fallback_max
            return threshold

        print("     Calculating threshold for n_genes_by_counts:")
        thresh_n_genes_lower = get_lower_bound(adata_sample.obs['n_genes_by_counts'], mad_threshold_n_genes, min_gene_threshold)

        print("     Calculating threshold for total_counts:")
        thresh_total_counts_lower = get_lower_bound(adata_sample.obs['total_counts'], mad_threshold_total_counts, min_count_threshold)

        thresh_mt_upper = max_mt_threshold # Default to absolute max
        if n_mt_genes > 0 : # Only calculate dynamic threshold if MT genes exist
            print("     Calculating threshold for pct_counts_mt:")
            thresh_mt_upper = get_upper_bound(adata_sample.obs['pct_counts_mt'], mad_threshold_mt_pct, max_mt_threshold)
        else:
            print(f"     Skipping dynamic MT% threshold calculation (no MT genes found). Using fixed max: {max_mt_threshold}")


        # 4d. Filter Spots (Cells/Observations)
        n_obs_before = adata_sample.n_obs
        print(f"     Applying filters: n_genes_by_counts >= {thresh_n_genes_lower}, total_counts >= {thresh_total_counts_lower}, pct_counts_mt <= {thresh_mt_upper}")

        # Create boolean masks for filtering
        obs_mask_genes = adata_sample.obs['n_genes_by_counts'] >= thresh_n_genes_lower
        obs_mask_counts = adata_sample.obs['total_counts'] >= thresh_total_counts_lower
        obs_mask_mt = adata_sample.obs['pct_counts_mt'] <= thresh_mt_upper

        # Combine masks
        adata_sample.obs['pass_qc'] = obs_mask_genes & obs_mask_counts & obs_mask_mt

        # Apply the filter by creating a new view/copy
        adata_filtered = adata_sample[adata_sample.obs['pass_qc']].copy() # Use .copy() crucial here!

        n_obs_after = adata_filtered.n_obs
        n_removed = n_obs_before - n_obs_after
        percent_removed = (n_removed / n_obs_before * 100) if n_obs_before > 0 else 0

        print(f"     Filtering removed {n_removed} spots ({percent_removed:.2f}%). Kept {n_obs_after} spots.")

        # (Optional) Filter Mitochondrial Genes (Variables)
        # Usually done *after* calculating MT% but before downstream analysis like normalization
        n_vars_before = adata_filtered.n_vars
        adata_filtered = adata_filtered[:, ~adata_filtered.var['mt']].copy() # Remove MT genes
        n_vars_after = adata_filtered.n_vars
        print(f"     (Optional) Removed {n_vars_before - n_vars_after} mitochondrial genes.")


        # 4e. Generate Post-filtering QC Plots (Optional but Recommended)
        print("     Generating post-filtering QC plots...")
        fig, axs = plt.subplots(1, 3, figsize=(15, 4))
        sc.pl.violin(adata_filtered, ['n_genes_by_counts'], jitter=0.4, ax=axs[0], show=False)
        sc.pl.violin(adata_filtered, ['total_counts'], jitter=0.4, ax=axs[1], show=False)
        sc.pl.violin(adata_filtered, ['pct_counts_mt'], jitter=0.4, ax=axs[2], show=False)
        fig.suptitle(f'QC Metrics After Filtering - {sample_id}')
        plt.tight_layout(rect=[0, 0, 1, 0.96])
        plot_path = qc_figures_dir / f"{sample_id}_qc_violin_after_filtering.png"
        plt.show()
        plt.savefig(plot_path)
        print(f"     Saved post-filter violin plot: {plot_path}")
        plt.close(fig)

        # Spatial plot showing filtered spots (requires spatial coords)
        fig, ax = plt.subplots(figsize=(8, 7))
        sc.pl.spatial(adata_sample, color=['pass_qc'], library_id=sample_id, title=f'Spots Passing QC - {sample_id}',
                      ax=ax, show=False, legend_loc='right margin', legend_fontsize=8, size=1.2) # Use original adata to show context
        plot_path = qc_figures_dir / f"{sample_id}_qc_spatial_filter_map.png"
        plt.subplots_adjust(right=0.8)
        plt.tight_layout()
        plt.show()
        plt.savefig(plot_path)
        print(f"     Saved spatial filter map: {plot_path}")
        plt.close(fig)


        # --- <<< END QUALITY CONTROL >>> ---


        # --- Save Processed Data ---
        print(f"  5. Saving processed AnnData object for {sample_id}...")
        output_file = output_adata_dir / f"QC_processed_{sample_id}.h5ad"
        # Ensure the AnnData object is clean before saving (remove temporary columns if desired)
        # del adata_filtered.obs['pass_qc'] # Optional cleanup
        adata_filtered.write_h5ad(output_file)
        print(f"     Saved processed data to: {output_file}")
        print(f"     Final dimensions: {adata_filtered.n_obs} spots, {adata_filtered.n_vars} genes")

        # Store the processed adata if needed for later steps in the same script
        # adatas_processed[sample_id] = adata_filtered

        print(f"--- Successfully processed and saved {sample_id} ---\n")

    except Exception as e:
        print(f"  FATAL ERROR during processing for {sample_id}: {e}")
        traceback.print_exc()
        print(f"  Skipping {sample_id} due to fatal error.\n")
        continue

print("="*40)
print("All samples processed.")
Loading individual samples, applying QC, and saving...
--- Processing Autopsy1 ---
  1. Loading Autopsy1 using scanpy.read_visium...
     Initial load: 799 spots, 18085 genes
  2. Making var_names unique...
  3. Verifying spatial metadata for library_id 'Autopsy1'...
     Metadata loaded and verified in .uns['spatial']['Autopsy1']: True
     Coordinates loaded (.obsm['spatial']): True
  4. Performing Quality Control for Autopsy1...
     Calculating MT percentage using prefix 'MT-'...
     Found 11 mitochondrial genes.
     Generating pre-filtering QC plots...
No description has been provided for this image
     Saved pre-filter violin plot: 101_Preprocess_qc_figures/Autopsy1_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Autopsy1_qc_scatter_before_filtering.png
     Determining dynamic filtering thresholds...
     Calculating threshold for n_genes_by_counts:
       - Median: 1463.00, MAD: 538.00, Raw Lower Threshold (3.0 MAD): -151.00
       - Final Lower Threshold (>= 50, >= 0): 50.00
     Calculating threshold for total_counts:
       - Median: 2200.00, MAD: 975.00, Raw Lower Threshold (3.0 MAD): -725.00
       - Final Lower Threshold (>= 100, >= 0): 100.00
     Calculating threshold for pct_counts_mt:
       - Median: 5.47, MAD: 1.69, Raw Upper Threshold (3.0 MAD): 10.53
       - Final Upper Threshold (<= 25.0, <= 100): 10.53
     Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 10.526315689086914
     Filtering removed 173 spots (21.65%). Kept 626 spots.
     (Optional) Removed 11 mitochondrial genes.
     Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved post-filter violin plot: 101_Preprocess_qc_figures/Autopsy1_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved spatial filter map: 101_Preprocess_qc_figures/Autopsy1_qc_spatial_filter_map.png
  5. Saving processed AnnData object for Autopsy1...
     Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Autopsy1.h5ad
     Final dimensions: 626 spots, 18074 genes
--- Successfully processed and saved Autopsy1 ---

--- Processing Autopsy2 ---
  1. Loading Autopsy2 using scanpy.read_visium...
     Initial load: 732 spots, 18085 genes
  2. Making var_names unique...
  3. Verifying spatial metadata for library_id 'Autopsy2'...
     Metadata loaded and verified in .uns['spatial']['Autopsy2']: True
     Coordinates loaded (.obsm['spatial']): True
  4. Performing Quality Control for Autopsy2...
     Calculating MT percentage using prefix 'MT-'...
     Found 11 mitochondrial genes.
     Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter violin plot: 101_Preprocess_qc_figures/Autopsy2_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Autopsy2_qc_scatter_before_filtering.png
     Determining dynamic filtering thresholds...
     Calculating threshold for n_genes_by_counts:
       - Median: 327.00, MAD: 124.00, Raw Lower Threshold (3.0 MAD): -45.00
       - Final Lower Threshold (>= 50, >= 0): 50.00
     Calculating threshold for total_counts:
       - Median: 381.50, MAD: 154.00, Raw Lower Threshold (3.0 MAD): -80.50
       - Final Lower Threshold (>= 100, >= 0): 100.00
     Calculating threshold for pct_counts_mt:
       - Median: 9.62, MAD: 3.03, Raw Upper Threshold (3.0 MAD): 18.71
       - Final Upper Threshold (<= 25.0, <= 100): 18.71
     Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 18.712793350219727
     Filtering removed 37 spots (5.05%). Kept 695 spots.
     (Optional) Removed 11 mitochondrial genes.
     Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved post-filter violin plot: 101_Preprocess_qc_figures/Autopsy2_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved spatial filter map: 101_Preprocess_qc_figures/Autopsy2_qc_spatial_filter_map.png
  5. Saving processed AnnData object for Autopsy2...
     Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Autopsy2.h5ad
     Final dimensions: 695 spots, 18074 genes
--- Successfully processed and saved Autopsy2 ---

--- Processing Autopsy3 ---
  1. Loading Autopsy3 using scanpy.read_visium...
     Initial load: 1741 spots, 18085 genes
  2. Making var_names unique...
  3. Verifying spatial metadata for library_id 'Autopsy3'...
     Metadata loaded and verified in .uns['spatial']['Autopsy3']: True
     Coordinates loaded (.obsm['spatial']): True
  4. Performing Quality Control for Autopsy3...
     Calculating MT percentage using prefix 'MT-'...
     Found 11 mitochondrial genes.
     Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter violin plot: 101_Preprocess_qc_figures/Autopsy3_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Autopsy3_qc_scatter_before_filtering.png
     Determining dynamic filtering thresholds...
     Calculating threshold for n_genes_by_counts:
       - Median: 1085.00, MAD: 469.00, Raw Lower Threshold (3.0 MAD): -322.00
       - Final Lower Threshold (>= 50, >= 0): 50.00
     Calculating threshold for total_counts:
       - Median: 1458.00, MAD: 699.00, Raw Lower Threshold (3.0 MAD): -639.00
       - Final Lower Threshold (>= 100, >= 0): 100.00
     Calculating threshold for pct_counts_mt:
       - Median: 6.61, MAD: 2.51, Raw Upper Threshold (3.0 MAD): 14.14
       - Final Upper Threshold (<= 25.0, <= 100): 14.14
     Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 14.139665603637695
     Filtering removed 412 spots (23.66%). Kept 1329 spots.
     (Optional) Removed 11 mitochondrial genes.
     Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved post-filter violin plot: 101_Preprocess_qc_figures/Autopsy3_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved spatial filter map: 101_Preprocess_qc_figures/Autopsy3_qc_spatial_filter_map.png
  5. Saving processed AnnData object for Autopsy3...
     Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Autopsy3.h5ad
     Final dimensions: 1329 spots, 18074 genes
--- Successfully processed and saved Autopsy3 ---

--- Processing Explanted1 ---
  1. Loading Explanted1 using scanpy.read_visium...
     Initial load: 999 spots, 18085 genes
  2. Making var_names unique...
  3. Verifying spatial metadata for library_id 'Explanted1'...
     Metadata loaded and verified in .uns['spatial']['Explanted1']: True
     Coordinates loaded (.obsm['spatial']): True
  4. Performing Quality Control for Explanted1...
     Calculating MT percentage using prefix 'MT-'...
     Found 11 mitochondrial genes.
     Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter violin plot: 101_Preprocess_qc_figures/Explanted1_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Explanted1_qc_scatter_before_filtering.png
     Determining dynamic filtering thresholds...
     Calculating threshold for n_genes_by_counts:
       - Median: 1327.00, MAD: 812.00, Raw Lower Threshold (3.0 MAD): -1109.00
       - Final Lower Threshold (>= 50, >= 0): 50.00
     Calculating threshold for total_counts:
       - Median: 1839.00, MAD: 1257.00, Raw Lower Threshold (3.0 MAD): -1932.00
       - Final Lower Threshold (>= 100, >= 0): 100.00
     Calculating threshold for pct_counts_mt:
       - Median: 2.77, MAD: 0.99, Raw Upper Threshold (3.0 MAD): 5.74
       - Final Upper Threshold (<= 25.0, <= 100): 5.74
     Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 5.740735054016113
     Filtering removed 149 spots (14.91%). Kept 850 spots.
     (Optional) Removed 11 mitochondrial genes.
     Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved post-filter violin plot: 101_Preprocess_qc_figures/Explanted1_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved spatial filter map: 101_Preprocess_qc_figures/Explanted1_qc_spatial_filter_map.png
  5. Saving processed AnnData object for Explanted1...
     Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Explanted1.h5ad
     Final dimensions: 850 spots, 18074 genes
--- Successfully processed and saved Explanted1 ---

--- Processing Explanted2 ---
  1. Loading Explanted2 using scanpy.read_visium...
     Initial load: 1359 spots, 18085 genes
  2. Making var_names unique...
  3. Verifying spatial metadata for library_id 'Explanted2'...
     Metadata loaded and verified in .uns['spatial']['Explanted2']: True
     Coordinates loaded (.obsm['spatial']): True
  4. Performing Quality Control for Explanted2...
     Calculating MT percentage using prefix 'MT-'...
     Found 11 mitochondrial genes.
     Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter violin plot: 101_Preprocess_qc_figures/Explanted2_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Explanted2_qc_scatter_before_filtering.png
     Determining dynamic filtering thresholds...
     Calculating threshold for n_genes_by_counts:
       - Median: 919.00, MAD: 447.00, Raw Lower Threshold (3.0 MAD): -422.00
       - Final Lower Threshold (>= 50, >= 0): 50.00
     Calculating threshold for total_counts:
       - Median: 1194.00, MAD: 664.00, Raw Lower Threshold (3.0 MAD): -798.00
       - Final Lower Threshold (>= 100, >= 0): 100.00
     Calculating threshold for pct_counts_mt:
       - Median: 1.99, MAD: 0.72, Raw Upper Threshold (3.0 MAD): 4.15
       - Final Upper Threshold (<= 25.0, <= 100): 4.15
     Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 4.149576663970947
     Filtering removed 201 spots (14.79%). Kept 1158 spots.
     (Optional) Removed 11 mitochondrial genes.
     Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved post-filter violin plot: 101_Preprocess_qc_figures/Explanted2_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved spatial filter map: 101_Preprocess_qc_figures/Explanted2_qc_spatial_filter_map.png
  5. Saving processed AnnData object for Explanted2...
     Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Explanted2.h5ad
     Final dimensions: 1158 spots, 18074 genes
--- Successfully processed and saved Explanted2 ---

--- Processing Explanted3 ---
  1. Loading Explanted3 using scanpy.read_visium...
     Initial load: 952 spots, 18085 genes
  2. Making var_names unique...
  3. Verifying spatial metadata for library_id 'Explanted3'...
     Metadata loaded and verified in .uns['spatial']['Explanted3']: True
     Coordinates loaded (.obsm['spatial']): True
  4. Performing Quality Control for Explanted3...
     Calculating MT percentage using prefix 'MT-'...
     Found 11 mitochondrial genes.
     Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter violin plot: 101_Preprocess_qc_figures/Explanted3_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Explanted3_qc_scatter_before_filtering.png
     Determining dynamic filtering thresholds...
     Calculating threshold for n_genes_by_counts:
       - Median: 1871.00, MAD: 1106.00, Raw Lower Threshold (3.0 MAD): -1447.00
       - Final Lower Threshold (>= 50, >= 0): 50.00
     Calculating threshold for total_counts:
       - Median: 2805.50, MAD: 1887.00, Raw Lower Threshold (3.0 MAD): -2855.50
       - Final Lower Threshold (>= 100, >= 0): 100.00
     Calculating threshold for pct_counts_mt:
       - Median: 3.02, MAD: 0.71, Raw Upper Threshold (3.0 MAD): 5.16
       - Final Upper Threshold (<= 25.0, <= 100): 5.16
     Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 5.161240577697754
     Filtering removed 59 spots (6.20%). Kept 893 spots.
     (Optional) Removed 11 mitochondrial genes.
     Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved post-filter violin plot: 101_Preprocess_qc_figures/Explanted3_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
No description has been provided for this image
     Saved spatial filter map: 101_Preprocess_qc_figures/Explanted3_qc_spatial_filter_map.png
  5. Saving processed AnnData object for Explanted3...
     Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Explanted3.h5ad
     Final dimensions: 893 spots, 18074 genes
--- Successfully processed and saved Explanted3 ---

========================================
All samples processed.
<Figure size 640x480 with 0 Axes>

3. Conclusion¶

All specified samples have been processed.

  • QC-filtered AnnData objects are saved as .h5ad files in the directory: ./101_Preprocess_processed_adata
  • QC plots (pre- and post-filtering violins, scatter plots, and spatial filter maps) are saved as .png files in the directory: ./101_Preprocess_qc_figures

These processed files can now be used for downstream analyses such as normalization, clustering, differential expression, and further spatial analysis.

In [5]:
print("\nProcessed AnnData files created:")
for f in sorted(output_adata_dir.glob('*.h5ad')):
    print(f'- {f.name}')

print("\nQC Figure files created:")
for f in sorted(qc_figures_dir.glob('*.png')):
    print(f'- {f.name}')
Processed AnnData files created:
- QC_processed_Autopsy1.h5ad
- QC_processed_Autopsy2.h5ad
- QC_processed_Autopsy3.h5ad
- QC_processed_Explanted1.h5ad
- QC_processed_Explanted2.h5ad
- QC_processed_Explanted3.h5ad

QC Figure files created:
- Autopsy1_qc_scatter_before_filtering.png
- Autopsy1_qc_spatial_filter_map.png
- Autopsy1_qc_violin_after_filtering.png
- Autopsy1_qc_violin_before_filtering.png
- Autopsy2_qc_scatter_before_filtering.png
- Autopsy2_qc_spatial_filter_map.png
- Autopsy2_qc_violin_after_filtering.png
- Autopsy2_qc_violin_before_filtering.png
- Autopsy3_qc_scatter_before_filtering.png
- Autopsy3_qc_spatial_filter_map.png
- Autopsy3_qc_violin_after_filtering.png
- Autopsy3_qc_violin_before_filtering.png
- Explanted1_qc_scatter_before_filtering.png
- Explanted1_qc_spatial_filter_map.png
- Explanted1_qc_violin_after_filtering.png
- Explanted1_qc_violin_before_filtering.png
- Explanted2_qc_scatter_before_filtering.png
- Explanted2_qc_spatial_filter_map.png
- Explanted2_qc_violin_after_filtering.png
- Explanted2_qc_violin_before_filtering.png
- Explanted3_qc_scatter_before_filtering.png
- Explanted3_qc_spatial_filter_map.png
- Explanted3_qc_violin_after_filtering.png
- Explanted3_qc_violin_before_filtering.png